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A system of differential equations is proposed designed as to iden- 
tify communities in weighted networks. The input is a symmetric 
connectivity matrix Aij. A priori information on the number of com- 
munities is not needed. To verify the dynamics, we prepared sets of 



■ separate, fully connected clusters. In this case, the matrix A has a 

. block structure of zeros and units. A noise is introduced as positive 

O ' random numbers added to zeros and subtracted from units. The task 

^ ' of the dynamics is to reproduce the initial block structure. In this 

(— I . test, the system outperforms the modularity algorithm, if the number 

Q^i of clusters is larger than four. 

in ■ PACS numbers: 02. 10. Ox, 05.45.-a, 07.05.Tp 
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p ; 1 Introduction 

^ ■ A lot of human relations in the world can be described with networks [1-10]. 

O ■ We ourselves are elements of social networks connecting people in the same 
family, faculty or club. We can also be involved in networks through the 

^ \ Internet - we send and receive e-mails, we create WWW pages, and we visit 

^ ■ a lot of them every day. Being employed, we are connected with people with 



the same field of interests. These connections are sometimes surprisingly far 
reaching especially nowadays when interdisciplinary tasks are very common. 
Other social networks are formed with religious or political ties. Another 
family is formed by economical networks; financial positions of many firms 
are connected to each other. Knowledge of the connections between firms 
enable an efficient diversification of the portfolio or forecasting of risk. This 
kind of networks awaken people's interest because of potential possibility of 
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enrichment, thanks to anticipation of the behaviour of the financial mar- 
ket. Studies of these and many other complex networks require not only 
knowledge of a given problem, e.g. sociology or economy, but also skills 
of application of the methods which enable correct analysis. Proper recon- 
struction of a network of interactions between interesting elements allows to 
predict selected aspects of the system. 

In particular, the problem of identification of communities in networks 
is relevant for numerous areas of knowledge. Economists are interested in 
clusterization of time series of financial data [11,12]. In biology the corre- 
lations between genes expression allow to infer on similar functions [13, 14]. 
Sociologists investigate the structure of social networks, looking for groups 
and their leaders [15, 16]. However, the problem is computationally difficult; 
time needed by exhaustive algorithms grows exponentially with the network 
size [17]. Numerous approximate solutions have been applied; for a recent 
review see Ref. [9]. 

In most of these works, the algorithms applied rely on discrete variables. 
Continuous dynamics happens to be used in networks within the mean-field 
theory, where the time evolution of a few collective variables is considered. 
This approach is presented e. g. in Ref. [18]. Also, differential equations 
have been applied to describe the time evolution of nodes of random net- 
works [19]. However, we are not aware on any application of the continuous 
dynamics to the topology itself, except the physically motivated approach of 
Wu and Huberman [20] and our own works on the Heider balance [21,22]. 
The algorithm presented in Ref. [20] relies actually on an approximated cal- 
culation of a continuous distribution of electric potential in a network. The 
Heider balance is a specific version of the problem because we have only two 
communities. However, as we demonstrate below, this drawback is easy to be 
removed by some reformulation of the model equations. In our opinion, the 
continuous description has some advantages which deserve the modeller's at- 
tention. First, in many applications the continuous variables seem to be more 
natural than the discrete ones. Second, with discrete variables often we have 
to rely on the stochastic dynamics, what in principle yields averaging over 
trajectories even if an initial state is given. Alternatively we have to present 
results which cannot be reproduced exactly by other authors, as they use 
different series of pseudorandom numbers. The advantage of discrete models 
is that the time of calculations is much shorter. 

In this paper we propose a set of differential equations to describe the time 
evolution of links in a weighted network. Initially, the weights of all links are 
given by real numbers between zero and one. The equation are designed as 
to drive the weights to or 1; then a link is supposed to disappear or persist. 
The obtained network reveals the cluster structure, which is supposed to be 
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the closest to the initial one. This kind of evolution is analogous to a task 
to find the closest energy minimum in a complex landscape. 

The question remains, how to verify a given formalism. Often we refer 
to social experiments, as the Zachary karate club [22,24]. However, if this 
kind of accordance appears, it should be treated merely as a nice demon- 
stration that the scheme of calculations is reasonable rather than as an ul- 
timate proof. In social experiments, there is so many uncontrolled factors 
that an exact prediction of the output is rarely possible; any theory is by 
necessity reductionistic, and only a limited part of reality can be taken into 
account. (Being aware that usually only qualitative accordance is possible, 
social scientists tend to treat a perfect accordance of theory with experiment 
as somewhat suspicious). Another possibility is to compare many different 
approaches [15]. However, with this method we cannot find one best al- 
gorithm, but rather to eliminate the worst one. Having this in mind, we 
designed a numerical test to compare our method with two other algorithms 
available in the literature, the modularity algorithm [23] and the shortest 
path algorithm [24] . The test is designed as that we know the proper results; 
in this way the above difficulties are omitted. 

In the next section we describe the model equations and the test. In 
Section 3 we compare the results obtained with the three methods. Last 
section is devoted to conclusions. 

2 Model equations and the test method 

The input to the calculation is an initial state of the symmetric real matrix 
A. Each matrix element Aij refers to the state of the link between nodes i 
and j. The time evolution of the matrix elements is determined by 



where G{x) = 0(x)0(l — x). This product of the step functions ensures that 
the matrix elements remain in the prescribed range (0,1). The parameter (3 
is a threshold (see below). The idea is that for each link ij all remaining 
nodes k arc to be considered one by one. The question is, if the link ij 
joins nodes from the same cluster or from different clusters? To decide this, 
we consider the products Ai^Akj] if both elements and A^j are large, 
they are supposed to join the nodes in the same cluster; then also i and j 
are in the same cluster. In this case, the product Ai^A^j is larger than the 
threshold value /9, and Aij increases. This tendency to increase or decrease 
is averaged over all nodes k. As a result, the matrix elements vary with 




(1) 



k 
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different velocities; evident parts of the cluster structure are determined at 
first, and they determine the further evolution. 

In our calculations, the parameter /3 is fixed as 0.25. This choice is mo- 
tivated as follows. Our aim is to discriminate two cases: i) when nodes i 
and k belong to the same cluster and nodes k and j belong to two different 
clusters, ii) when all three nodes belong to the same cluster. A priori, 
and Akj can be treated as independent random variables; if their distribution 
is uniform in the range (0,1), the average of their product is (1/2)^. 

To check how the method works, wc use a set of R identical, fully con- 
nected clusters ("cliques" [9]), each of K nodes; the whole number of nodes 
is = RK. The N x N connectivity matrix of this system has a block 
structure, with squares K x K oi units along the main diagonal and ze- 
ros elsewhere. This initial structure is hidden when all matrix elements are 
disturbed; for each matrix element Aij a random number eij is generated 
with a uniform distribution from the range (0, e). The transformation is 
Aij Bij{Aij,eij), where 

B{x,€) ^ {l-x)e + x{x-e) (2) 

In this way, e is added to zeros and subtracted from units. This noised 
connectivity matrix B serves as an initial condition to Eq. (1). The time 
evolution should reproduce structure of the initial matrix Aij at the maximum 
of the modularity Q, defined as [23] 

^ = (3) 

where w — ^ B^j , ki = B^j and the factor S indicates that only nodes in 
the same cluster are taken into account. This quantity allows to distinguish 
the network with given structure from the equivalent random network. 

The same Bij matrix is the input for the modularity algorithm [23] and 
the algorithm of shortest paths [24] . In the case of the weighted network the 
shortest paths method requires a redefinition of the distance matrix. Our 
tests indicate that the best results are obtained if the non-diagonal elements 
of the matrix are equal to reverse square of the correlation coefficient. In 
order to find all shortest paths connecting pairs of nodes in the network the 
modified Bellman- Ford algorithm was used [25]. Modification of the algo- 
rithm was necessary because the original version does not take into account 
all possible shortest paths. At each iteration just this edge is removed from 
the network and the modified Bellman-Ford algorithm is applied to the ob- 
tained network. The simulation is terminated when all edges are removed 
from the system. 
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Figure 1: Results for R—2 clusters, each of K—2>A nodes. The modularity 
algorithm works best. 



3 Results 

In Figs. 1-7 we show the results of the calculations of the probability 
that the initial connectivity matrix Aij will be reproduced. In all figures, the 
curves are marked as follows: triangles for Eq. 1, crosses for the modularity 
and squares for the shortest paths. The results are shown for various N and 
R. The probability p depends on the maximal value e of the random numbers, 
used to introduce a noise. All results arc averaged over 100 matrices. It is 
obvious, that for small e all methods work well; there, p is equal to unity for 
all three methods. However as e increases, errors appear and the probability 
p starts to decrease. 

As we see, the shortest path method fails for the smallest noise. The 
modularity method is the best for R=2 and 3. For i?=4, the modularity 
method and Eq. 1 give approximately the same results. For i?=6 and higher, 
the probability of success is the largest for the evolution given by Eq. 1. 
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Figure 2: Results for R—3 clusters, each of K—24: nodes. The modularity 
algorithm works best. 
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Figure 3: Results for i?=4 clusters, each of 7^=15 nodes. The results of Eq. 
1 are as good as those of the modularity algorithm. 
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Figure 4: Results for i?=4 clusters, each of K=2Q nodes. The results of Eq. 
1 are as good as those of the modularity algorithm. 
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Figure 5: Results for R—6 clusters, each of K—10 nodes. Eq. 1 works best. 
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Figure 6: Results for R—S clusters, each of K—9 nodes. Eq. 1 works best. 
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Figure 7: Results for R—16 clusters, each of K—5 nodes. Eq. 1 works best. 
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4 Discussion 



Our results are preliminary. Further calculation should explore the param- 
eter space {R, K) more thoroughly. The method should also be checked for 
clusters of different sizes, which are not fully connected. As the number of 
internal hnks is reduced, the clusters are "less dense". When the number of 
strong links between such clusters is too large, the identification of commu- 
nities becomes impossible - this is the natural boundary of any method and 
of the definition of communities as well. It will be interesting to repeat our 
comparison in this difficult area. The drawback is the time of computation; 
the number of differential equations to be solved numerically is N{N — l)/2. 
This limitation can be somewhat returned by time of writing the code, as 
the method is conceptually very simple. Our results indicate, that it can be 
useful when applied to networks of moderate size, but with larger number of 
clusters. Also, it can be immediately generalized to unweighted networks; in 
this case the hnks are described by integers at the beginning and at the end 
of the calculation. 

As it can be seen in the presented figures, the method works well in the 
case of large number of clusters, but its efficiency is worse for two or three 
clusters. In the latter case the number of triads k where nodes i,j belong 
to the same cluster but k does not is relatively large. The product Ai^Akj is 
lower than the threshold h and these triads produce a decrease of A^-; this 
lowers the performance of the whole method when the number of clusters is 
small. 

There is some relation between the problem of communities and the 
infinite-range spin glass problem. However, this relation cannot be reduced 
to an identity. For two communities, a spin can be defined as a variable 
which marks to which community a node does belong. However, for a larger 
number of communities this analogy does not hold. The set of potentially 
stable solutions of the continuous dynamics contains in particular the case 
when every node belongs to a different cluster. Then, if we try to apply the 
spin analogy, the spin dimension should be at least equal to the number of 
nodes. It seems that it is more proper to work with the variables denoting 
the states of the links only. 
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